Explicit semantic analysis について

Words near each other

・ Explore (TV series)
・ Explore Evolution
・ Explore Learning
・ Explore More Discovery Museum
・ Explore Park
・ Explicit formulae (L-function)
・ Explicit Game
・ Explicit Ills
・ Explicit knowledge
・ Explicit Lyrics
・ Explicit memory
・ Explicit modeling
・ Explicit multi-threading
・ Explicit parallelism
・ Explicit reciprocity law
・ Explicit semantic analysis
・ Explicit substitution
・ Explicit symmetry breaking
・ Explicitly parallel instruction computing
・ Expliseat
・ Explo '72
・ Explocity
・ Explode (album)
・ Explode (Cover Drive song)
・ Explode (Nelly Furtado song)
・ Explode a Bombshell
・ Explode Coração
・ Exploded Drawing
・ Exploded-view drawing
・ Explodemon

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Explicit semantic analysis ：ウィキペディア英語版

Explicit semantic analysis
In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words. Typically, the text corpus is Wikipedia, though other corpora including the Open Directory Project have been used.
ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization
and has been used by this pair of researchers to compute what they refer to as "semantic relatedness" by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans", where Wikipedia articles (or ODP entries, or otherwise titles of documents in the knowledge base corpus) are equated with concepts.
The name "explicit semantic analysis" contrasts with latent semantic analysis (LSA), because the use of a knowledge base makes it possible to assign human-readable labels to the concepts that make up the vector space.
==Model==
To perform the basic variant of ESA, one starts with a collection of texts, say, all Wikipedia articles; let the number of documents in the collection be . These are all turned into "bags of words", i.e., term frequency histograms, stored in an inverted index. Using this inverted index, one can find for any word the set of Wikipedia articles containing this word; in the vocabulary of Egozi, Markovitch and Gabrilovitch, "each word appearing in the Wikipedia corpus can be seen as triggering each of the concepts it points to in the inverted index."
The output of the inverted index for a single word query is a list of indexed documents (Wikipedia articles), each given a score depending on how often the word in question occurred in them (weighted by the total number of words in the document). Mathematically, this list is an -dimensional vector of word-document scores, where a document not containing the query word has score zero. To compute the relatedness of two words, one compares the vectors (say and ) by computing the cosine similarity,
:

\mathsf(\mathbf, \mathbf)    = \frac}\|^2}    = \frac^N">u_i^2 ) (v_i^2)}

and this gives numeric estimate of the semantic relatedness of the words. The scheme is extended from single words to multi-word texts by simply summing the vectors of all words in the text.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Explicit semantic analysis」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース